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ABSTRACT 


There  are  generally  considered  to  be  four  major  types  of  digital  circuit 
redundancy  techniques  in  use  today.  They  are  the  Shannon-Moore  component 
redundancy  method,  the  "quadded  logic"  approach  suggested  by  Tryon, von  Neumann's 
triplication  and  majority- voting  technique,  and  the  basic  standby  redundancy 
systems.  When  viewed  in  the  light  of  design  criteria  which  are  based  upon  the 
requirements  of  non-maintainable  spacecraft,  it  appears  that  the  triplication 
and  majority-voting  scheme  is  optimal,  although  more  costly  in  weight,  power, 
and  component  cost  than  one  would  prefer.  It  is  the  purpose  of  this  paper  to 
describe  a  redundancy  technique  which  would  require  mere  duplication  to  achieve 
the  same  failure -masking  capabilities  as  the  von  Neumann  method. 

An  analysis  of  the  circuit -failure  problem  is  approached  from  the  view¬ 
point  of  coding  theory  with  comparisons  made  between  the  "noisy  channel"  and 
"circuit-failure"  problems.  Some  of  the  difficulties  of  extrapolating  from 
the  former  to  the  latter  are  discussed,  as  well  as  recent  attempts  to  minimize 
the  redundancy  "overhead"  by  coding  over  larger  numbers  of  bits. 

Following  a  description  of  the  binary  erasure  channel  model,  a  proposal 
of  a  failure -erasure  technique  based  upon  it  is  outlined.  The  method  enables 
failure -masking  at  duplicative  rather  than  triplicative  costs.  There  are 
constraints  which  this  scheme  imposes  upon  the  circuit  elements,  however,  and 
the  characteristics  of  the  ideal  circuit  element  and  logic  signaling  are 
proposed.  The  paper  concludes  with  a  discussion  of  existing  hardware  which 
approximates  the  desired  characteristics. 


Accepted  for  the  Air  Force 
Stanley  J.  Wisniewski 
Lt  Colonel,  USAF 
Chief,  Lincoln  Laboratory  Office 


iii 


TABLE  OF  CONTENTS 


INTRODUCTION  1 

EXISTING  REDUNDANCY  TECHNIQUES  1 

COMMUNICATION  CHANNEL  ANALOGS  5 

THE  DESIGN  OF  FAILURE-ERASURE  CIRCUITRY  11 

CONCLUSIONS  18 

ACKNOWLEDGMENT  18 


V 


INTRODUCTION 


Foremost  among  the  requirements  for  electronics  of  the  future  is  the 
ability  of  these  systems  to  perform  reliably  for  exceptionally  long  periods 
of  time  without  need  for  maintenance  of  any  sort.  Pioneering  efforts  in  this 
direction  were  made  primarily  in  the  submarine  cable  area  but  it  was  only 
with  the  advent  of  complex  airborne  electronic  systems  that  terms  like 
"quality  control",  "maintainability"  and  "MTBF"  made  their  appearance  in  the 
technical  literature.  The  demands  which  non-maintainable  space-borne  elec¬ 
tronics  systems  have  made  upon  component  test  and  evaluation  and  associated 
areas  have  forced  electronic  components  to  reach  very  high  levels  of  reli¬ 
ability. 

The  fact  remains,  nevertheless,  that  while  components  are  more  reliable 
than  ever,  the  increasing  complexity  of  space-borne  electronics,  coupled  to 
the  need  for  exceptionally  long  operating  lifetimes  of  five  to  ten  years, 
leaves  little  hope  for  reliable  components  alone  to  achieve  the  desired  goals. 
There  is  always  the  finite  probability  that  the  first  failure  will  occur  much 
more  quickly  than  the  anticipated  mean  time.  When  one  has  to  design  an  ex¬ 
ceptionally  reliable  system  this  possibility  must  always  be  considered  and 
the  cost  of  the  addition  of  failure -masking  techniques  (perhaps  Just  in 
critical  areas)  should  be  considered.  There  are  drawbacks  to  the  use  of 
redundancy  also,  particularly  in  terms  of  additional  weight  and  power.  It  is 
the  purpose  of  this  paper  to  propose  a  redundancy  technique  which  will  reduce 
the  traditional  redundancy  "overhead"  by  about  fifty  per  cent. 

EXISTING  REDUNDANCY  TECHNIQUES 

Before  entering  into  a  discussion  of  the  analog  of  communications 
channels  in  failure -masking  techniques,  it  might  be  well  to  review  what  the 
authors  consider  the  major  contemporary  techniques  currently  being  employed  to 
one  extent  or  another.  The  four  schemes  to  be  reviewed  are  discussed  in  more 
detail  elsewhere'*'  and,  accordingly,  only  brief  descriptions  will  be  given  in 
this  paper. 

^An  excellent  single  source  of  information  on  redundancy  schemes  Is  "Redundancy 
Techniques  for  Computing  Systems"  edited  by  Wilcox  and  Mann,  Spartan  Books, 
1962. 
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The  first  redundancy  technique  to  be  considered  is  normally  applied  at 

the  component  level  and  is  probably  the  redundancy  method  most  used  today. 

2 

Called  the  Shannon-Moore  scheme,  it  relies  upon  knowing  the  probability  of 
a  particular  failure  mode  (open  or  short  circuit).  In  the  extreme  case  where 
a  component  will  only  fail  in  a  short  circuit  mode,  the  redundant  configura¬ 
tion  of  Figure  la  should  be  used.  Where  the  component  will  only  fail  as  an 
open  circuit,  the  configuration  of  Figure  lb  is  an  obvious  choice.  Figure  lc 
is  used  for  a  component  whose  short- failure  probability  is  equal  to  its  open- 
failure  probability.  The  original  paper  referred  to  redundant  relays,  which 
were  open  or  closed,  by  design  or  otherwise.  The  use  of  this  technique  for 
other  types  of  components  must  be  very  carefully  considered  in  order  to 
preserve  the  basic  tenet  of  this,  and  any  other  redundancy  scheme,  which  is 
the  statistical  independence  of  the  failures.  This  technique  may  be  used  in 
linear  circuitry  also  but  also  must  be  applied  with  caution.  A  failure 
occurring  within  a  redundant  set  should  not  increase  the  stress  level  on  the 
other  components  within  the  set.  A  difficulty  with  this  system,  and  most 
component  redundancy  schemes  above  and  beyond  that  of  power  and  weight  in¬ 
creases,  is  that  of  determining  where  failures  exist  prior  to  the  equipment 
entering  its  critical  non-naintainable  period  of  operation.  Component  re¬ 
dundancy  makes  this  somewhat  difficult  to  achieve.  Some  schemes  have  been 
employed  in  which  the  redundant  configurations  are  split  into  two  distinct 
systems  which  are  independently  tested  and,  upon  successful  completion  of 
these  tests,  the  two  systems  are  reconnected  into  one.  Such  techniques 
generally  do  not,  however,  verify  that  all  components  are  working.  The  in¬ 
creased  usage  of  integrated  circuitry,  some  types  of  which  display  a  relative 
lack  of  isolation  between  circuit  elements,  may  further  limit  the  use  of  this 
scheme.  The  component  cost  of  this  technique  is  generally  about  a  factor  of 
four  above  that  of  the  non-redundant  scheme.  As  components  fail,  power 
consumption  increases.  Because  of  the  change  in  the  equivalent  component 

^Moore,  E.  F.  and  C.  E.  Shannon,  "Reliable  Circuits  Using  Less  Reliable 
Relays",  Journal  of  the  Franklin  Institute,  195&. 
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impedance  as  failures  occur,  it  is  not  prudent  to  use  such  methods  where 
component  values  must  be  precisely  controlled. 

Another  scheme  for  failure -masking  is  called  "quadded  logic"  or  Tryon's 
method, ^  this  technique  operates  best  at  the  circuit  level.  It  is  based 
upon  quadruplication  of  circuitry  and  the  error  is  usually  corrected  within 
two  or  three  levels  of  logic  "downstream"  from  the  initial  failure.  Correction 
is  based  upon  receiving  correct  signals  from  the  previous  levels,  which  are 
interwoven  in  connection  networks  to  minimize  propagation  of  the  error. 

Boolean  expressions  which  contain  "don't  care"  terns  actually  do  the  correction. 
For  example 

1  +  0  =  1  +  1  =  1 
0xl  =  0x0  =  0 

where 

the  underlined  term  is  the  "don't  care"  or  "correctable"  term  for  that  par¬ 
ticular  expression.  A  logical  network  is  shown  in  Figure  2a  and  its  quadded 
equivalent  is  shown  in  Figure  2b. 

There  are  a  number  of  disadvantages  to  this  technique,  not  the  least  of 
which  is  the  quadruplication  of  circuits  and  the  interconnection  morass,  which 
begets  unreliability.  Power  consumption  is  quadrupled  also.  The  debugging 
of  such  a  system  is  very  difficult  and  on  a  par  with  the  Shannon-Moore  scheme. 
The  use  of  this  method  for  timing  circuitry  such  as  astable  or  monostable 
multivibrators,  is  also  troublesome. 

The  third  major  method  of  implementing  redundant  systems  is  the  majority- 

4 

voting  scheme  originally  proposed  by  von  Neumann.  The  technique  involves 

•^Tryon,  J.  G.,  "Quadded  Logic"  in  Wilcox  and  Mann,  op.  cit.  p.  205- 

4 

von  Neumann,  J.,  "Probabilistic  Logics  and  the  Synthesis  of  Reliable  Organisms 
from  Unreliable  Components",  Automata  Studies,  Annals  of  Mathematics  Studies, 
No.  34.  Princeton  University  Press,  1956. 


3 


triplication,  at  least,  with  restoration  of  the  desired  signal  accomplished 
through  a  "restorer"  which  is  a  majority-voting  logic  element.  This  type  of 
redundancy  is  usually  applied  at  a  subsystem  or  systems  level  and  is  shown  in 
a  block  diagram  in  Figure  3a»  In  order  to  guard  against  a  failure  of  the 
voting  circuit,  a  redundant  system,  using  this  method,  may  be  arranged  as 
shown  in  Figure  3t>» 

From  the  standpoint  of  pre launch  debugging,  this  method  offers  the  greatest 
advantage  since  the  system  may  be  trisected  into  three  non-redundant  systems, 
a  failure  in  which  is  readily  detectable.  After  debugging,  the  portions  are 
reconnected.  The  power  and  weight  dissipated  in  such  a  system  are  multiplied 
by  a  factor  which  is  slightly  greater  than  three.  The  circuitry  used  may  be 
of  the  conventional  variety  and  commercially  available  integrated  circuits 
may  be  employed.  Free -running  timing  circuitry  still  presents  problems  in 
this  scheme.  Care  must  also  be  taken  in  turning  on  such  a  system  since  certain 
subsystem  circuitry,  such  as  counters,  must  have  the  same  starting  point.  Since 
a  transient  noise  condition  can  make  a  counter  disagree  with  its  two  redundant 
counters  (triplication  is  assumed),  the  use  of  feedback  of  the  restored  signal 
might  be  employed  to  obtain  correction  of  the  transient-induced  miscounting. 
Feedback  shift  registers  may  be  used  as  counters  to  advantage  under  these  cir¬ 
cumstances,  as  shown  in  Figure  4. 

An  interesting  variation  of  the  straightforward  majority-voting  scheme 
has  been  proposed  by  W.  H.  Pierce'’  and  is  based  upon  adaptive  weighting  of 
the  signals  entering  the  signal  restorer,  or  vote-taker. 

The  final  redundancy  scheme,  "standby"  redundancy,  is  perhaps  the  most 
basic.  In  this  method  a  system  is  operated  with  one  or  more  identical  systems 
kept  in  parallel  but  not  operating.  When  the  operating  system  is  detected  as 

^Pierce,  W.  H.,  "Adaptive  Vote-Takers  Improve  the  Use  of  Redundancy", 

Redundancy  Techniques  for  Computing  Systems .  op.  cit. 
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being  in  error  or  as  having  failed,  a  signal  is  generated  which  turns  on  the 
next  "standby"  system.  The  difficulty  with  this  arrangement  is  the  detection 
of  the  error,  or  failed  system,  with  a  minimum  number  of  available  operating 
systems.  More  will  be  mentioned  of  this  problem  shortly  but,  suffice  to  say 
at  this  time,  this  is  a  rather  complex  problem  involving  minimum  redundancy 
design.  However,  if  the  failure  mode  to  be  masked  is  comparatively  simple, 
this  technique  can  be  used  effectively. 

To  conclude  this  review  of  the  major  redundancy  schemes,  it  appears  that 
on  the  basis  of  ease  of  prelaunch  checkout  and  minimum  numbers  of  components, 
that  the  triplication  and  majority- voter  technique  is  the  most  feasible  al¬ 
though  there  are  areas  within  a  digital  system,  such  as  time  base  generators, 
where  one  would  find  another  approach  more  practical.  The  power  and  weight 
penalty  is  a  factor  slightly  more  than  three.  It  appears  to  be  of  significant 
advantage  if  a  minimum-redundancy  scheme  could  be  evolved  which  would  allow 
much  smaller  "overhead"  for  failure -masking  systems. 

COMMUNICATION  CHANNEL  ANALOGS 

In  this  section,  suggestions  deriving  from  coding  and  communication 
theory  as  tools  to  generate  redundant  digital  systems  are  discussed.  The 
approach  taken  is  to  discuss  the  traditional  usage  of  coding  theory  in  the 
transmission  of  digital  information,  then  to  show  the  differences  between  such 
traditional  usage  and  the  problem  at  hand.  It  is  seen  that  the  type  of  failure 
considered  here,  catastrophic  circuit  failure,  is  handled  only  by  the  par¬ 
alleling  of  circuits.  An  example  of  the  use  of  coding  theory  to  determine 
the  optimum  degree  of  parallelism  is  given  and  the  limitations  discussed.  An 
alternate  technique  based  on  the  erasure  channel  is  then  developed. 

A  block  diagram  of  a  simple  digital  communications  system  is  shown  in 
Figure  5*  This  system  consists  of  the  series  connection  of  an  information 
source,  an  encoder,  a  transmitter,  a  channel,  a  receiver,  a  decoder,  and  an 
information  user.  Typically  the  information  source  supplies  messages  in 
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binary  form;  also  it  is  generally  assumed  that  as  signals  propagate  through 
the  channel  they  are  corrupted  or  perturbed  by  some  kind  of  noise.  The  en¬ 
coder-decoder  operate  as  a  pair  so  that  the  binary  messages  can  be  transmitted 
from  information  source  to  information  user  with  the  least  probability  of  error. 

The  usual  problem  associated  with  such  digital  systems  is  the  design  of 

the  encoder-decoder.  Although  coding  techniques  have  been  developed  for  a 

perfectly  noiseless  channel  the  major  emphasis  has  always  been  on  systems  in 

which  the  channel  had  noise  characteristics.  In  particular,  it  is  the  purpose 

of  these  codes  to  detect  and  sometimes  to  detect  and  correct  such  errors.0 

Generally  these  codes  require  that  the  occurrence  of  errors  be  independent  of 

digit  position.  This  is  statistical  independence  of  errors.  Certain  other 

codes  have  been  developed  which  will  correct  bursts  of  errors  in  a  message. 

In  fact,  a  fundamental  theorem  of  coding  tells  us  that  it  is  possible  to  encode 

and  to  decode  in  such  a  fashion  that  (consistent  with  the  channel  capacity) 

7 

the  probability  of  incorrect  decoding  can  be  made  arbitrarily  small.  Un¬ 
fortunately,  no  straightforward  design  path  for  the  realization  of  this  exists. 
Typical  coding  techniques  exchange  error  correction  capability  for  rate  of  in¬ 
formation  transmission. 

The  block  diagram  of  Figure  5  shows  a  series  connection  which,  as 
mentioned  previously,  is  typical.  There  are  many  different  sources  of  noise 
which  can  perturb  the  signals  in  the  channel.  Notwithstanding  the  noise 
source  itself,  the  principal  effect  of  the  noise,  signal  corruption,  is 
relatively  transient.  That  is  one  does  not  find  message  after  message  complete¬ 
ly  incorrect  (prior  to  decoding)  rather  one  finds  errors  here  and  there, 
scattered  throughout  the  word  length.  Completely  incorrect  messages  would  be 


Peterson,  W.  W.,  "Error- Correcting  Codes",  M.I.T.  Press  and  John  Wiley  &  Sons, 

1961. 

7 

Shannon,  C.  E. ,  "The  Mathematical  Theory  of  Communication",  BSTJ,  pp.  379- 
U23;  1948. 
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indicative  of  a  catastrophic  failure  in  the  series  connection.  No  amount  of 
coding  dexterity  will  correct  the  errors  due  to  such  a  catastrophic  failure 
in  a  series  connection. 

It  is  seen  therefore,  that  codes  can  be  effective  against  transient  type 
noise  in  a  series  connected  system  but  that  they  are  useless  against  catastroph¬ 
ic  failures. 

In  most  digital  systems  the  noise  perturbing  the  channel  is  generally  not 
excessive  due  to  the  fact  that  the  channel  may  be  simply  a  wire  connection 
running  a  distance  of  several  feet.  The  primary  cause  of  error  is  the  failure 
of  circuits.  Circuit  failure  can  be  transient  in  nature,  due  to  component  and 
power  supply  drift,  poor  interconnections  causing  intermittent  failure,  and 
the  like.  Even  with  the  series  connection,  as  long  as  the  transient  errors 
are  truly  transient,  then  the  use  of  codes  would  be  beneficial. 

On  the  other  hand,  circuit  failure  can  be  catastrophic  with  the  net  result 
that  the  series  information  flow  chain  is  permanently  broken.  These  failures 
can  come  about  from  extreme  voltage  and  component  variation,  opens  and  shorts, 
and  so  forth.  Coding  cannot  help  here  when  a  series  connection  is  used. 

If  it  is  desired  to  correct  errors  due  to  catastrophic  circuit  failures, 
then  it  is  clear  that  the  series  connection  of  elements  is  useless  and  same 
kind  of  parallel  arrangement  is  required.  Various  kinds  of  parallel  arrange¬ 
ments  were  discussed  in  previous  sections;  these  parallel  configurations 
differ  in  degree  of  parallelism  in  the  sense  of  the  amount  of  extra  hardware, 
space,  power  consumption,  and  cost.  Error  correction  techniques  in  the  form 
of  coding  can  help  to  determine  an  optimum  degree  of  parallelism. 

It  is  extremely  important  to  realize  that  at  this  point  we  have  shifted 
our  attention  from  the  use  of  coding  to  generate  time  redundancy  to  the  use 
of  coding  to  generate  parallel  redundancy  because  of  the  inherent  differences 
between  the  failure  modes.  Consider  a  circuit,  as  shown  in  Figure  6,  consist- 
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ing  of  a  Boolean  function  elenent  and  parity  operators.  If  a  Hamming  single 
error  correcting  code  is  used,  then  a  word  three  digits  long  can  be  used  with 
positions  1  and  2  containing  the  parity  digits  for  the  information  digit  in 
position  3* 

The  classical  coding  approach  would  be  to  determine  the  position  of  the 
incorrect  bit  through  the  parity  check  equations.  The  correction  would  be 
accomplished  by  complementing  the  bit  in  the  position  indicated.  Some  thought 
about  this  reveals  that  a  non-trivial  amount  of  logic  is  required  to  perform 
the  implementation. 

However,  if  words  000  and  111  are  chosen,  corresponding  to  the  informa¬ 
tion  digit,  being  0  and  1  respectively,  it  can  readily  be  verified  that  the 
parity  operators  are  the  same  as  the  function  element  shown  in  Figure  7-  If 
the  majority  vote  taker  is  used  as  a  decoder,  as  shown  in  Figure  8,  then  the 
implementation  of  error  correction  can  readily  be  carried  out.  It  is  desirable 
to  retain  the  parity  check  to  detect  an  error  since  the  majority  vote  taker 
automatically  carries  out  the  correction,  thus  mas  Icing  the  fact  that  an  error 
occurred. 

Q 

A  similar  technique  was  previously  considered  by  Armstrong  who  carried 
it  further  to  attempt  to  take  advantage  of  the  fact  that  as  the  number  of 
information  digits  per  message  grows,  then  the  ratio  of  check  digits  to  in¬ 
formation  digits  goes  down.  Armstrong's  arguments  concern  a  digital  system 
with  m  inputs  and  n  outputs.  It  must  be  such  that  it  can  be  broken  down  into 
r  electrically  independent  subunits,  each  subunit  carrying  not  more  than  p 
of  the  n  outputs.  The  diagram  is  shown  in  Figure  9*  Consider  the  following 
matrix,  in  which  the  outputs  of  each  subunit  are  displayed  in  a  separate  row 
with  p  entries  per  row.  The  (q-r)  additional  subunits  provide  the  necessary 
check  digits. 


Armstrong,  D.  B.,  "A  General  Method  of  Applying  Error  Correction  to  Synchro¬ 
nous  Digital  Systems",  BSTJ,  pp.  577-593;  March  1961. 
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000. . .0 

**  outputs  from  subunit  1 

r 

000. . .0 

*■  outputs  from  subunit  2 

rows 

•  •  •  • 

•  •  «  » 

<1 

•  •  •  • 

000. . .0 

outputs  from  subunit  r 

rows 

q-r 

XXX. . .X 

*”  outputs  from  subunit  r  +  1 

rows 

•  •  •  • 

•  *  •  • 

•  *  •  • 

XXX  X 

outputs  from  subunit  q 

If  the  error  to  be  corrected  is  assumed  to  occur  in  only  one  subunit, 
then  at  most  a  single  row  can  have  errors.  Hamming  single  error  correcting 
codes  can  be  used  with  each  column  regarded  as  a  message  word.  It  is  clear 
that  the  cost  of  redundancy  can  be  made  much  less  than  for  the  triplication 
scheme.  Ray-Chaudhuri  has  shown  that  since  the  errors  can  occur  only  along 
a  single  row,  that  is,  the  error  position  in  each  word  must  be  common  to  all 
words,  then  codes  which  require  fewer  parity  bits  for  the  some  message  length 
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than  Hamming  codes  are  feasible. 

The  equipment  redundancy  is  essentially  created  by  the  parity  check 
circuits  and  the  error  correcting  circuits.  The  error  correcting  circuits, 
as  noted  previously,  are  quite  difficult  to  realize  and  it  is  here  that  certain 
of  the  advantages  gained  by  coding  are  offset.  Armstrong's  estimate  of  the 
total  equipment  redundancy  is  a  factor  of  three  for  most  systems.  A  simple 
example  of  the  technique  is  shown  in  Figure  10.  The  objective  is  to  select 
one  of  (xy,  x'y,  x'y',  xy')  given  (x,  x',  y,  y'). 

The  nonredundant  logic  requires  four  AND  gates  and  the  generation  of  the 
three  parity  bits  requires  three  OR  gates.  In  this  case  the  redundancy  in¬ 
troduced  is  less  than  the  original  system.  Minimization  of  the  error 
correction  logic  will,  however,  significantly  increase  the  overhead.  It 
should  be  clear  that  coding  theory  can  lead  the  way  to  a  rigorous  determina- 

y  Ray-Chaudhuri,  D.,"0n  the  Construction  of  Minimally  Redundant  Reliable 
System  Designs".  BSTJ,  pp.  595-611;  March  1961. 
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tion  of  the  optimum  degree  of  circuit  parellelism  but  to  date  even  the  most 
sophisticated  schemes  are  expensive  and  cumbersome  primarily  due  to  the 
complexity  of  the  decoding  and  error  correction  mechanism. 

It  is  felt  that  it  should  be  possible  to  translate  certain  other  commu¬ 
nication  theory  ideas  into  digital  circuit  design  specifications  such  that 
the  degree  of  total  equipment  redundancy  can  be  made  lover. 

Consider  a  binary  erasure  channel  (EEC)  as  shown  in  Figure  11.  The 
channel  diagram  shows  that  if  a  0  is  transmitted  it  will  either  be  received 
as  a  0  with  probability  p  or  received  as  a  no  decision  (equivalent  to  "I 
don't  know  whether  the  bit  sent  was  0  or  1")  with  probability  q.  Similar 
remarks  apply  to  a  1  being  transmitted. 

Suppose  that  one  had  available  digital  circuits  such  that  a  model  for 
their  operation  would  be  similar  to  the  BEC.  These  circuits  would  have 
specified  failure  modes  which  would  be  equivalent  to  the  no  decision  node  of 
the  BEC.  In  this  case  a  simple  duplication  of  the  original  circuit  and  the 
addition  of  an  OR  circuit  can  correct  any  single  error  provided  that  the 
failure  modes  are  such  that  the  OR  circuits  are  insensitive  to  them.  A 
diagram  of  such  a  system  is  shown  in  Figure  12.  The  model  for  the  operation 
of  either  the  A  or  the  B  system,  shown  in  Figure  13,  indicates  that  the  system 
has  n  inputs  or  2n  distinct  input  states.  When  the  system  operates  correctly 
each  input  state  has  a  correct  output  state  which  is  either  0  or  1  depending 
on  the  particular  input  state  and  the  logical  function  of  the  system.  In 
addition,  every  input  state  has  a  path  to  a  no  decision  state.  The  truth 
table  for  the  OR  circuit  is  shown  below. 


A  (output) 

B  (output) 

OR  (output) 

0 

0 

0 

1 

1 

1 

HD 

0 

0 

HD 

1 

1 

0 

HD 

0 

1 

HD 

1 

HD 

HD 

HD 
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The  OR  element  operational  model  is  shown  in  Figure  l4.  This  model 
implements  the  truth  table  and  illustrates  the  operating  philosophy  of  the 
binary  erasure  channel  as  applied  to  the  enhancement  of  the  reliability  of 
operating  digital  systems.  In  an  operational,  configuration,  the  circuit 
elements  would  be  disposed  as  shown  in  Figure  15,  in  order  that  the  two  dis¬ 
tinct  systems  may  be  separated  for  individual  checkout.  With  each  OR  circuit 
receiving  but  one  signal  and  a  "no  decision"  from  the  unconnected  input  wire, 
the  individual  sections  should  still  work  perfectly. 

The  actual  design  of  electronic  circuits  possessing  the  characteristics 
specified  here  will  be  considered  in  the  following  section. 

THE  DESIGN  OF  FAILURE-ERASURE  CIRCUITRY 

The  basic  requirement  of  failure -erasure  circuitry  is  that  any  abnormal 
component  parameter  drift  or  catastrophic  failure  have  an  immediate  influence 
upon  the  transfer  characteristics  of  that  logic  network  such  that  the  sub¬ 
sequent  logic  elements  will  not  respond  to  the  output  signals  of  the  faulty 
element.  In  the  failure -masking  system,  the  OR  "decoder"  must  not  malfunction 
if  one  of  the  inputs  carries  a  "no  decision"  signal. 

There  are  then  several  characteristics  which  must  be  designed  into 
failure-erasure  circuitry;  a  high  degree  of  inter-component  dependence  lead¬ 
ing  to  tight  control  over  the  output  characteristics  is  essential,  knowledge 
of  the  failure -modes  of  the  components  within  the  networks  is  vital  as  is  the 
use  of  output  signals  which  are  less  likely  to  be  misconstrued  in  decoding. 

The  relative  isolation  achievable  between  duplicated  networks  must  be  high  so 
that  a  failure  of  one  network  does  not  interact  with  the  duplicated  network. 
The  reliability  of  the  decoder  must  be  exceptional. 

Let  us  now  look  into  these  characteristics  in  further  detail  and  attempt 
to  synthesize  the  type  of  logical  element  desired.  The  most  formidable  of 
the  desired  features  is  the  ability  of  any  component  failure,  catastrophic  or 
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severe  drift,  within  the  network  to  drastically  influence  the  network  output 
in  such  a  way  as  to  yield  the  IJD  signal,  or  any  signal  which  is  obviously  not 
a  "l"  or  "0".  It  would  appear  that  the  most  feasible  logical  element,  as  well 
as  the  most  reliable  one,  is  one  that  has  a  minimum  of  components.  For  example, 
in  a  simple  parallel  RLC  circuit  an  open  resistor  would  make  itself  known  by 
an  increase  in  output  amplitude  and  a  catastrophic  short  in  the  capacitor 
would  put  an  abrupt  end  to  the  oscillation  of  the  tank.  As  one  departs  from 
the  three -parameter  network  into  more  complex  networks,  the  contribution  of 
each  component  becomes  increasingly  masked  by  the  peripheral  components  until 
a  single  component  failure  in  a  network  may  scarcely  influence  the  form  of  the 
output  signals  of  that  network  but  may  still  yield  faulty  information.  Indeed, 
since  most  logic  circuitry  employs  transistors  used  as  switches,  a  transistor 
which  fails  catastrophically  as  an  open  or  a  short  will  yield  an  output  signal 
which  is  identical  to  one  of  its  operational  output  signals. 

This  point  might  be  enlarged  upon  to  stress  the  fact  that  linear  logic 
signals,  by  reason  of  their  higher  information  content,  are  more  amenable  to 
failure  detection  techniques.  The  classical  input-output  characteristics  of 
the  digital  gate,  usually  with  a  saturating  active  element  used  as  an  inverter, 
would  look  like  Figure  l6,  with  the  normal  operating  tolerances  contributing 
to  the  widening  of  the  operational  areas.  The  "0"  and  "l"  signals  are  de¬ 
fined  by  Vq  and  in  that  anything  less  than  Vq  is  a  "0"  and  anything  greater 
than  is  a  "l".  Unfortunately,  these  same  regions  are  also  the  failure  mode 
regions  for  catastrophic  failures. 

By  use  of  a  linear  logic  signal  system,  the  transfer  characteristics 
would  become  similar  to  Figure  17 .  There  is  now  a  greater  amount  of  signal 
variation  possible,  thereby  providing  higher  information  content.  The  decisions 
which  a  failure -detect ion  system  might  make  clearly  indicate  the  ease  of 
picking  out  the  logic  gates  where  catastrophic  failures  have  occurred.  Linear 
operation  is  very  costly,  however,  by  reason  of  the  component  tolerances  re¬ 
quired  and  is  particularly  bad  in  the  space  environment  where  the  peculiarities 
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of  that  environment,  such  as  radiation  and  vacuum  make  tight  parameter  control 
very  difficult.  Future  effort  on  linear  logic  techniques  may,  however,  pro¬ 
duce  significant  results. 

There  is  another  facet  to  the  statement  that  logic  signals  should  look 
as  unlike  the  failure  mode  signals  as  possible.  Where  the  majority  of  failures 
are  catastrophic  opens  or  shorts,  the  use  of  static  logic  (d.c.  levels)  signals 
invites  trouble.  The  communications  practitioner  will  always  endeavor  to  make 
his  channel  signals  look  as  much  unlike  the  anticipated  channel  noise  as 
possible.  If  one  looks  upon  the  wires  from  one  logic  element  to  another  as 
communications  channels,  then  the  same  philosophy  should  apply.  It  would  seem 
that  substantial  isolation  of  catastrophic  failures  would  be  gained  by  a.c. 
logic  signaling  systems.  Such  isolation  of  elements  is  necessary  in  any  re¬ 
dundancy  scheme  in  order  to  prevent  a  faulty  system  from  totally  disabling 
the  decoder  which  is  to  correct  the  fault.  By  preventing  the  propagation  of 
the  results  of  a  fault  over  a  large  portion  of  the  logic  system  and  constrain¬ 
ing  the  area  of  influence  of  the  failure,  the  number  of  failures  which  a  logic 
system  can  withstand  within  itself  may  be  considerably  increased  without  loss 
of  its  failure -masking  capabilities. 

At  the  present  time  there  are  no  devices  and/or  circuits  available  which 
exactly  fulfill  the  requirements  of  failure -eras lire  circuitry.  However, 
certain  of  these  requirements  can  be  approximated.  As  an  example  of  existent 
hardware  which,  to  some  extent,  has  the  properties  and  characteristics  out¬ 
lined  above,  we  would  like  to  consider  the  p arametron. 

A  parametron  element  is  essentially  a  resonant  circuit  with  a  reactive 
element  varying  periodically  at  frequency  2f  which  generates  a  parametric 
oscillation  at  the  subharnonic  frequency  f.  In  practice,  the  periodic  varia¬ 
tion  is  accomplished  by  applying  an  exciting  current  of  frequency  2f  to  a 
balanced  pair  of  non-linear  reactors. 
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A  non-linear  inductance  type  vas  invented  in  195^  by  Eiichi  Goto  at 
Tokyo  University.^  A  non-linear  capacitance  t^pe  vas  suggested  by 
John  von  Neumann  in  the  United  States  in  195^*  The  non-linear  inductance 
type  has  been  widely  utilized  in  Japan  as  the  primary  logical  element  in 
digital  computers.  The  non-linear  capacitance  type  has  been  used  in  the 
United  States  with  the  capacitor  a  varactor  diode. 

The  subharmonic  parametric  oscillation  generated  has  the  remarkable 
property  in  that  the  oscillation  will  be  stable  in  either  of  two  phases  which 
differ  by  7r  radians  with  respect  to  each  other.  Utilizing  this  fact,  a  para- 
metron  represents  and  stores  one  binary  digit,  "0"  or  "l",  by  the  choice  be¬ 
tween  these  two  phases,  0  or  ?r  radians. 

If  the  inductance  type  circuit  shown  in  Figure  18  is  tuned  to  f,  then 
the  output  will  build  up  exponentially.  The  phase  of  the  output  will  follow 
the  phase  of  the  input  which  is  determined  by  the  algebraic  summing  action 
of  the  coupling  transformer  at  the  input.  It  is  this  majority  vote  which 
allows  the  device  to  be  utilized  as  a  logical  element.  A  similar  description 
applies  to  the  varactor  diode  type. 

Since  oscillation  in  either  of  the  two  stationary  states  is  extremely 
stable  the  application  of  the  opposite  phase  signal  at  the  input  during 
oscillation  will  have  no  effect  on  the  parametron.  The  exciting  a.c.  signal 
must  be  reduced  to  zero  and  then  increased  in  order  to  change  the  output  phase. 

Goto  has  shown  that  as  the  resonant  circuit  of  the  inductance  type 
parametron  is  detuned  by  varying  L  or  C  the  subharmonic  oscillator  frequency 
remains  constant  but  the  amplitude  changes  as  shown  in  Figure  19-  Significant 
detuning  in  one  direction  causes  the  output  to  go  to  zero,  in  the  other 

^  Goto,  E.,"The  Parametron,  A  Digital  Computing  Element  Which  Utilizes 

Parametric  Oscillation",  Proc.  I.  R.  E.,  Vol.  Vf ,  pp.  130^-1316;  August  1959* 
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A  Discussion  of  von  Neumann's  original  patent  application  appears  in  "A 
New  Concept  in  Computing",  R.  L.  Wigington,  Proc.  I.R.E.  pp.  5l6-523>  April 
1959- 
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direction  a  tristable  region  exhibiting  hysteresis  is  encountered.  The  tri¬ 
stable  region  has  three  stable  states,  0  phase  oscillation,  it  phase  oscillation, 
and  no  oscillation.  (It  is  interesting  to  note  that  these  three  states  could 
form  the  basis  of  a  ternary  device.)  Detuning  past  this  tristate  region  causes 
the  output  to  go  to  zero. 

The  parametron  fits  the  specifications  for  a  binary  erasure  channel  "type" 
of  electronic  digital  circuit  quite  well.  Given  a  logical  function  to  perform 
the  parametron  uses  a  majority  vote  scheme  at  its  input  and  then  simply  ampli¬ 
fies  the  result.  The  amplifier  is  such  that  changes  in  its  components  which 
are  sufficiently  large  will  cause  the  output  to  eventually  go  to  zero.  Sim¬ 
ilarly  a  failure  in  the  input  transformer  could  be  disastrous.  The  channel 
model  for  the  parametron  could  be  like  that  shown  in  Figure  20.  The  proba¬ 
bility  of  going  to  the  correct  output  state  for  any  input  state  is  p,  the 
probability  of  going  to  the  no  decision  output  state  from  any  input  state  is 
q  and  the  probability  of  going  to  the  incorrect  output  state  from  any  input 
is  r  (primarily  due  to  the  tristate  region).  The  parametron  digital  circuit 
will  have  p>  q  and  q>  r.  This  is  a  distinct  advantage  over  contemporary 
digital  circuits  in  which  a  "no  decision"  mode  does  not  exist  at  all.  Because 
of  the  inherent  stability  of  the  passive  components  used  r  should  almost  be 
zero.  The  complete  circuit  to  utilize  the  duplicative  redundancy  scheme  would 
be  exactly  as  shown  in  Figure  12,  or  Figure  15 . 

There  are  also  means  by  which  more  conventional  logic  circuitry  may  be 
made  amenable  to  failure-erasure  techniques  but  not  without  some  compromises 
on  achievable  perf ormance . 

Let  us  assume  that  every  logic  circuit  in  a  system  generates  the  complement 
of  its  primary  output  signal.  If  the  failure  mode  of  a  circuit  is  considered 
to  be  the  situation  where  the  primary  signal  of  the  circuit  is  identical  to 
the  complementary  circuit,  then  this  may  be  utilized  as  the  single  necessary 
condition  for  a  "no  decision",  and  error-detection  may  be  accomplished  under 
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the  assumed  failure-mode  condition. 

The  logic  circuit  which  can  fulfill  the  output  requirements  is  shown  in 
Figure  21.  Designed  with  a  view  toward  development  of  a  universal  micro¬ 
power  logic  element,  it  has  two  sets  of  inputs:  they  are  called  the  AND/llAND 
input  section,  and  the  OR/NOR  input  section.  When  the  circuit  is  used  as  an 
AND  or  HARD  gate,  the  standard  terms  are  connected  to  the  AND/NAND  input  diodes, 
with  the  complementary  terms  connected  to  the  OR/lIOR  input  diodes.  The  reverse 
connections  are  made  when  the  circuit  is  used  for  an  OR  or  NOR  gate.  As  an 
example,  let  it  be  assumed  that  the  expression  (A  X  B)  is  desired.  Signal  A 
and  signal  B  are  connected  individually  to  diodes  D1  and  D2,  while  A  and  B 
are  connected  to  D3  and  D4.  DeMorgan's  theorem  and  the  cross -coupling  assure 
complementary- transistor  outputs  of  operation  such  that  output  Y  provides  the 
(A  X  B)  term  desired  and  output  X  the  complementary  (HAND)  term,  (A  X  B).  In 
the  OR/NOR  mode,  output  X  provides  the  OR  term  and  output  Y  the  NOR  term.  Al¬ 
though  the  circuit  requires  the  availability  of  complementary  input  signals, 
it  also  provides  similar  output  signals  for  use  in  subsequent  levels  of  logic. 

A  flip-flop  may  be  formed  by  removing  point  Z  from  the  positive  power 
buss  and  attaching  it  to  output  Y  while  output  X  is  connected  to  one  of  the 
OR/NOR  input  diodes. 

Circuits  of  this  configuration  can  display  power  dissipations  of  less 
than  10 1  W  and  switching  times  less  them  0.5m  s,  these  times  being  primarily 
dependent  upon  input  diode  capacitance  and  desired  noise  immunity. 

Using  this  circuit,  the  possible  sets  of  output  states  are : 

X  Y 

0  failure -mode 

1  "0" 

0  "1 " 

1  failure -mode 


0 

0 

1 

1 
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If  we  now  place  an  MOS  field-effect  transistor  across  the  circuit  as 
shown  in  Figure  22,  a  signal  current,  I  ,  is  present  only  when  X  =  1  and 
Y  =  0.  There  is  no  active  response  by  the  FET  during  failure-mode  conditions 
and  at  "0"  signal  because  of  insufficient  biasing  signals  or  complete  reverse 
bias.  By  utilizing  the  failure -erasure  technique  detailed  in  the  paper,  the 
failure -masking  configuration  of  the  preceding  logic  example  becomes  that 
shown  in  Figure  23,  where  the  failure -erasure  OR  circuit  is  composed  of  the 
two  MOS  FET' s  connected  together  at  their  drains. 

The  MOS  FET  is  a  desirable  circuit  element  for  these  configurations 
because  of  the  large  voltage  differences  required  between  source  and  gate 
before  the  unit  is  activated,  as  well  as  the  exceptionally  high  input  impe¬ 
dances  which  these  transistors  display,  permitting  high  degrees  of  isolation 
between  compared  outputs. 

It  is  obvious  that  an  MOS  FET  failure  or  a  failure  of  the  logic  element 
in  its  signal  state  will  compromise  the  system.  Variations  of  this  technique 
can,  of  course,  be  applied  to  the  standby- redundancy  schemes  which  require 
failure  detection  as  an  initial  procedure  before  the  faulty  circuit  has  its 
power  turned  on  and  the  standby  unit  activated. 
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CONCLUSIONS 


A  redundancy  technique  has  been  presented  which  requires  mere  duplication 
of  elements.  In  order  to  bring  this  technique  into  practice  the  circuits  used 
must  have  certain  signaling  characteristics.  These  characteristics  have  been 
reviewed  and  detailed.  It  is  hoped  that,  in  light  of  the  results  of  this 
paper,  effort  will  be  forthcoming  in  the  areas  of  device  research  and  circuit 
design  to  develop  other  techniques  of  realizing  failure-erasure  circuitry  such 
that  future  space  missions  can  be  made  more  reliable. 
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Fig.  1.  Basic  Shannon-Moore  Configurations 
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Fig.  2.  "Quadded"  logic  diagram  (adapted  from  Wilcox  and  Mann,  op  cit) 
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Fig.  3.  Majority  voting  redundancy  configurations 
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Fig.  4.  Transient-failure  restoring  scheme 
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Fig.  5.  Block  diagram  of  a  digital  communication  system 
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Fig.  6.  Communication  analog  redundant  circuit 
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Fig.  7.  A  hamming  code  redundant  circuit 
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Fig.  9.  Sub-unit  breakdown 
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Fig.  10,  Redundant  selection  circuit 
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Fig.  12.  Redundant  system 
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Fig.  13.  A  or  B  system  model 
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Fig.  14.  Or  element  operational  model 
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Fig.  16.  Typical  digital  transfer  characteristic 
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Fig.  17.  Linear  system  transfer  characteristic 
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Fig.  19.  Stability  profile  (from  Ref.  10) 
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Fig.  20.  Channel  analog  for  parametron 
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Fig.  22.  Failure -erasure  logic  element 
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Fig.  23.  Failure  masking  logic  element 
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